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Abstract 


We- introduce  a  new  type  of  method  for  unconstrained  optimization,  which  we  call  a  tensor  method. 
It  is  related  in  its  basic  philosophy  to  the  tensor  methods  for  nonlinear  equations  of  Schnabel  and  Frank, 
but  beyond  that  the  methods  have  significant  differences.  The  tensor  method  for  unconstrained  optimiza¬ 
tion  bases  each  iteration  upon  a  fourth  order  model  of  the  objective  function.  This  model  consists  of  the 
quadratic  portion  of  the  Taylor  series,  plus  low  rank  third  and  fourth  order  terms  that  cause  the  model  to 
interpolate  already  calculated  function  and  gradient  values  from  one  or  more  previous  iterates.  We  show 
that  the  costs  of  forming,  storing,  and  solving  the  tensor  model  are  not  significantly  more  than  these  costs 
for  a  standard  method  based  upon  a  quadratic  Taylor  series  model.  Test  results  are  presented  for  sets  of 
problems  where  the  Hessian  at  the  minimizer  is  nonsingular,  and  where  it  is  singular.  On  all  the  test  sets, 
the  tensor  method  solves  considerably  more  problems  than  a  comparable  standard  method.  On  problems 
solved  by  both  methods,  the  tensor  method  requires  about  half  as  many  iterations,  and  half  as  many  func¬ 
tion  and  derivative  evaluations  as  the  standard  method,  on  the  average. 


1.  Introduction 


This  paper  describes  a  new  method,  called  a  tensor  method,  for  solving  the  unconstrained  optimiza¬ 
tion  problem 

given/  :Rn->R  ,  find*.  eRH  such  that/ (r.)  </(r)  for  all  x  e  D  (1.1) 

where  D  is  some  open  neighborhood  containing  x . .  We  assume  that  /(x)  is  at  least  twice  continuously 
differentiable,  and  that  n  is  of  moderate  size,  say  n  <  100.  Our  objective  is  to  create  a  general  purpose 
method  that  is  more  reliable  and  efficient  than  state-of-the-art  methods  for  solving  such  problems,  partic¬ 
ularly  in  cases  where  the  evaluation  of  /(x)  and  its  derivatives  is  expensive.  We  especially  intend  to 
improve  upon  the  efficiency  and  reliability  of  standard  methods  on  problems  where  V2/  (x. )  is  singular. 

The  distinguishing  feature  of  our  new  method  is  that  it  bases  each  iteration  upon  a  fourth  order 
model  of  /(x),  as  opposed  to  the  standard  quadratic  model.  The  third  and  fourth  order  terms  of  this 
model  have  special,  low-rank  forms  that  make  the  costs  of  using  the  higher  order  model  reasonable.  In 
particular,  in  comparison  to  standard  methods,  the  formation  and  use  of  the  fourth  order  tensor  model 
requires  no  additional  function  or  derivative  evaluations  per  iteration,  only  a  small  number  of  additional 
arithmetic  operations  per  iteration,  and  only  a  very  small  amount  of  additional  storage. 

The  tensor  method  approach  was  introduced  by  Schnabel  and  Frank  [1984,  1987],  who  describe 
tensor  methods  for  solving  systems  of  nonlinear  equations.  Their  methods  base  each  iteration  of  an  algo¬ 
rithm  for  solving  F(x)  =  0,  where  F  :  R’,—>RH,  upon  the  second  order  model 

MT(xc+d)  =  F(xc)  +  Jcd  +  ]/iTcdd  .  (1.2) 

Here  xc  is  the  current  iterate,  JceR nx*  is  the  Jacobian  matrix  F'(xc)  or  an  approximation  to  it,  and 
TceJ?"x"x"  is  a  low  rank  "tensor".  In  Schnabel  and  Frank’s  computational  experiments,  the  use  of  the 
tensor  methods  led  to  significant  improvements  in  efficiency  and  reliability  over  state-of-the-art  methods 
for  nonlinear  equations  that  are  based  upon  standard,  linear  models.  In  the  case  when  F\xc )  is  available. 
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the  average  reductions  measured  in  function  and  derivative  evaluations  ranged  from  20%  to  60%,  on  both 
nonsingular  and  singular  problems.  Frank  [1984)  also  proved  that  this  derivative  tensor  method  has  a 
three-step,  order  1.16  local  convergence  rate  on  problems  where  rank(F'(x. ))  =  n-I,  whereas  standard 
methods  are  linearly  convergent  under  these  conditions. 

The  tensor  method  described  in  this  paper  is  related  to  the  methods  of  Schnabel  and  Frank  in  its 
basic  philosophy,  but  it  is  not  a  straightforward  generalization  of  their  methods.  In  particular,  it  is  not  the 
application  of  the  model  (1.2)  to  the  problem  V/(.x)  =  0.  This  would  correspond  to  using  a  third  order 
model  of  /(*);  as  we  have  already  stated,  we  use  a  fourth  order  model  instead.  To  help  motivate  the 
basic  differences,  we  first  summarize  some  features  of  standard  methods  for  unconstrained  optimization. 

Standard  methods  for  solving  small  to  moderate  size  unconstrained  optimization  problems  base 
each  iteration  upon  a  quadratic  model  offix)  around  the  current  iterate  xc , 

m(xc+d)=f(xc)  +  gId  +  VidTHcd  .  (1.3) 

where  deR" ,  gceRn  is  V/  (*c )  or  a  finite  difference  approximation  to  it,  and  Hc  e  R  "  .  Such  methods 

can  be  divided  into  two  classes,  those  where  Hc  is  V2/  (xc)  or  a  finite  difference  approximation  to  it,  and 
those  where  Hc  is  a  secant  approximation  to  the  Hessian  formed  solely  from  current  and  previous  gra¬ 
dient  values.  In  this  paper  we  will  consider  standard  and  tensor  methods  of  the  first  type,  where  both 
V/(xc)  and  V2/ (xc )  are  available  analytically  or  by  finite  differences  at  each  iteration.  A  subsequent 
paper  will  discuss  tensor  methods  for  unconstrained  optimization  that  are  based  solely  on  function  and 
gradient  values. 

The  fundamental  method  for  unconstrained  optimization,  Newton’s  method,  is  defined  when 
V2/  (jcc)  is  nonsingular.  It  consists  of  using  Hc  =  V2/  (xc)  in  (1.3)  and  setting  the  next  iterate  x+  to  the 
critical  point  of  (13), 

r+  =  rc-VV(tc)-!V/(rc).  (1.4) 

The  basic  properties  of  Newton’s  method  are  well  known.  If  V2/  (jc-  )  is  nonsingular  at  a  local  minimizer 
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x . ,  and  the  initial  iterate  is  sufficiently  close  to  x. ,  then  the  sequence  of  iterative  generated  by  (1.4)  con¬ 
verges  quadratically  to  x . .  If  the  initial  iterate  is  not  sufficiently  close  to  x. .  then  the  iterates  produced  by 
Newton’s  method  may  not  converge  to  x • ,  but  they  may  be  made  to  converge  through  the  use  of  line 
search  or  trust  region  strategies  (see  e.g.  Fletcher  {1980],  Gill,  Murray,  and  Wright  [1981],  Dennis  and 
Schnabel  [1983]).  The  main  costs  of  unconstrained  optimization  methods  based  upon  Newton’s  method 
are  :  one  evaluation  of  V2/  (x),  and  one  or  more  evaluations  of  V/  (x)  and  /  (x),  at  each  iteration;  the 
solution  of  a  symmetric  nxn  system  ''f  linear  equations  at  each  iteration,  costing  a  small  multiple  of  n- 
arithmetic  operations;  and  the  storage  of  a  symmetric  n  xn  matrix. 

One  shortcoming  of  standard  unconstrained  minimization  methods  is  that  they  do  not  converge 
quickly  if  the  Hessian  at  the  minimizer,  V2/  ( x » ),  is  singular.  Griewartk  and  Osborne  [  1983]  have  shown 
that  in  this  case,  the  iterates  produced  by  (1.4)  generally  are  at  best  linearly  convergent,  even  if  V2/  (xc) 
is  non-singular  at  all  the  iterates.  Furthermore,  the  third  derivatives  do  not  supply  information  in  the 
direction(s)  where  the  second  derivative  matrix  is  lacking,  since  the  necessary  conditions  for  minimiza¬ 
tion  show  that  at  any  minimizer  x.  where  V2/  ( x . )  is  singular  with  null  vector  v ,  V3/  (*.  )wd  must  also 
be  0  for  all  de/?".  Thus,  adding  an  approximation  to  V3/  (xc)  alone  will  not  lead  to  better  than  linear 
convergence  for  such  problems.  An  approximation  to  the  fourth  derivative  V4/  (xc )  as  well,  or  at  least 
the  quantity  V4/  (xc  )ww ,  is  necessary  to  obtain  better  than  linear  convergence. 

This  need  for  fourth  order  information  in  order  to  obtain  fast  convergence  on  singular  problems  is 
one  reason  why  we  will  use  a  fourth  order  model,  rather  than  a  third  order  model,  in  our  tensor  methods 
for  optimization.  Other  reasons  are  that  a  third  order  model  is  unbounded  below,  even  though  it  may 
have  a  local  minimizer,  and  that  the  information  that  is  readily  available  in  an  optimization  algorithm, 
namely  values  of/(x)  and  V/(x)  at  previous  iterates,  naturally  supports  the  use  of  a  fourth  order  tensor 
model.  Note  that  these  conditions  are  quite  different  from  the  situation  for  systems  of  nonlinear  equa¬ 
tions,  where  an  approximation  to  F"(x> )  (analogous  to  V3/^* ))  is  sufficient  to  produce  faster  than  linear 
convergence  on  problems  where  the  Jacobian  at  the  solution  is  singular,  and  where  only  one  piece  of 
interpolation  information  (F  (x),  analogous  to  V/  (x))  is  readily  available  from  each  previous  iterate. 
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For  these  reasons,  we  have  based  the  tensor  methods  for  unconstrained  minimization  discussed  in 
this  paper  upon  the  fourth  order  tensor  model 

mT(xc+d)=f(xc)  +  Vf(xc)Td  dTV2f(xc)d  +  ^  Tc  ddd  +  Vc  dddd  (1.5) 

where  by  V/  ( xc )  ana  V2/  (xc)  we  mean  either  these  analytic  derivatives,  or  finite  difference  approxima¬ 
tions  to  them,  and  where  TceRnxn><Jt  and  are  symmetric.  (The  symmetry  of  Tc  and  Vc  is 

another  significant  difference  between  tensor  models  for  unconstrained  optimization  and  for  nonlinear 
equations,  where  Tc  is  not  symmetric.)  The  three-dimensional  object  Tc  and  the  four-dimensional  object 
Vc  are  referred  to  as  tensors,  hence  we  call  (1.5)  a  tensor  model,  and  methods  based  on  (1.5)  tensor 
methods.  Before  proceeding,  we  define  the  notation  concerning  these  tensors  that  is  used  above  and  in 
the  remainder  of  this  paper. 


Definition  1.1  Let  TeRnxnxn .  Then  for  u,v  ,w  ,eRn ,  TuvweR ,  TvweR" ,  with 


Tuvw  - 


n  n  n 


Tvw  [i  ]  = 


w[^]  ,1=1,  ,n  . 


Definition  1.2  Let  v eRnXH*A™ .  Then  for  r,u,v ,weRH ,  VruvweR,  VuvweRn  with 


Vruvw  = 


V[iJJcJ]r[i)u[j]v[k)w[l), 


1=17 


Vuvw  [i  ]  = 


V[iJJtJ]u\J]v[k]w[l]  .  i=l  ,•■•,/» 


/= 


The  obvious  choices  of  Tc  and  Vc  in  (1.5)  are  V3/  ( xc )  and  V4/  (j:,.);  these  would  make  (1.5)  the 
first  five  terms  of  the  Taylor  series  expansion  of  /  around  xc .  We  will  not  consider  using  the  actual 
higher  order  derivatives  in  the  tensor  model,  however,  because  the  cost  of  doing  so  would  be  prohibitive. 
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In  particular,  0(n*)  partial  derivatives  would  have  to  be  evaluated  or  approximated  at  each  iteration;  stor¬ 
ing  these  derivatives  would  take  0(n 4)  locations;  and  finding  a  minimizer  or  the  model  would  require  the 
solution  of  a  difficult  minimization  problem  in  n  variables  at  each  iteration.  Each  of  these  reasons  alone 
is  sufficient  to  reject  this  alternative  for  a  general  purpose  method,  although  we  note  that  for  some  func¬ 
tions  f  (x)  with  special  forms,  using  analytic  higher  order  information  can  be  viable  (see  Jackson  and 
McCormick  [1986]). 

Instead,  our  new  method  will  choose  Tc  and  Vc  in  (1.5)  to  be  simple,  low-rank  symmetric  approxi¬ 
mations  to  V3/  (xc)  and  V4/  (xc )  that  are  formed  from  previously  calculated  function  and  gradient  values. 
The  remainder  of  this  paper  will  show  how  we  efficiently  form  and  solve  such  a  tensor  model,  how  we 
incorporate  it  into  a  complete  unconstrained  optimization  algorithm,  and  what  the  computational  perfor¬ 
mance  of  this  algorithm  is.  Section  2  describes  how  we  form  the  tensor  model,  and  shows  tliat  this 
requires  only  a  small  multiple  of  n2  additional  arithmetic  operations  per  iteration,  and  a  sma’J  multiple  of 
n  additional  storage  locations.  In  Section  3  we  show  how  we  solve  this  model  using  only  0(n2)  more 
operations  per  iteration  than  the  0(r t3)  operations  that  are  needed  by  the  standard  quadratic  model.  A  full 
tensor  algorithm  for  unconstrained  optimization  is  presented  in  section  4.  In  section  5,  we  present  test 
results  of  our  tensor  method  on  problems  from  Morti,  Garbow  and  Hillstorm  [1981],  and  on  modifications 
of  these  problems  constructed  so  that  V2/  (. x . )  is  singular.  We  compare  these  results  to  those  obtained 
from  a  state-of-the-art  algorithm  that  uses  the  standard  quadratic  model  (1.3)  but  is  identical  to  the  tensor 
algorithm  in  virtually  all  other  respects.  We  briefly  summarize  our  research  and  comment  on  possible 
extensions  of  this  work  in  Section  6. 

We  will  denote  members  of  a  sequence  of  n  -vectors  x  by  (** },  where  each  x *eJ?" ,  and  to  avoid 
ambiguity  with  this  notation,  we  will  continue  to  denote  components  of  a  vector  veRn  by  v[t]efl.  We 
will  also  abbreviate  terms  of  the  form  dd,  ddd,  and  dddd  in  our  tensor  models  by  d2,  d3,  and  d*  respec¬ 
tively. 
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2.  Forming  the  Tensor  Model 

Now  we  discuss  how  we  select  the  tensor  terms  TceRnxn>,Jl  and  vce  Rn'J'*J'Kn  in  the  model 

mT(xc+d)  =/(;tc)+V/(xc)rd+4-drV2/  (Xc)d+jTcd3+^Vcd4  (2.1) 

We  have  already  stated  that  Tc  and  Vc  will  not  contain  actual  third  and  founh  denvative  iruoimation. 
Instead,  we  will  use  the  third  and  the  fourth  order  terms  in  (2.1)  to  cause  the  model  to  interpolate  function 
and  gradient  information  that  has  been  computed  at  some  previous  iterations  of  the  optimization  algo¬ 
rithm.  In  particular,  we  will  select  p  not  necessarily  consecutive  past  iterates  x_i,  •  •  • ,  x_p,  and  ask  that 
the  model  (2.1)  interpolate  f  (x)  and  V/(x)  at  these  points,  i.e. 

/  (x_*)=/(jcc)+V/ (xc)T Sk+jsfV2/  (xc)sk+^Tcsi?+-^-Vcsk*.  k  =  \,  p  (2.2a » 

V/ (*_*)  =  V/ (xc HV2/ (xc )sk  +-j Tc sk2+^ Vc sk\  k-l.  p  (2.2b) 

where 

sk=x.k-xc ,  k=\.  ■  p.  (2.2c ) 

First  we  briefly  summarize  how  the  past  points  x_i,  •  •  ■  j-p  are  selected.  Then  we  discuss  how  we  select 
Tc  and  Vc  so  that  (2.2)  is  satisfied. 

The  set  of  past  points  used  in  the  interpolation  conditions  (2.2)  is  selected  by  the  procedure  given  in 
Schnabel  and  Frank  [1984],  We  always  select  the  most  recent  previous  iterate,  and  then  select  each 
preceding  past  iterate  if  the  step  from  it  to  xc  makes  an  angle  of  at  least  8  degrees  with  the  subspacc 
spanned  by  the  steps  to  the  already  selected,  more  recent  iterates.  Values  of  0  between  20  and  45  degrees 
have  proven  to  be  best  in  practice;  therefore  the  selected  directions  (r* )  are  strongly  linearly  independent. 
This  procedure  is  easily  implemented  using  a  modified  Gram-Schmidt  algorithm.  We  also  set  an  upper 
bound 


p<n13  . 


(2.3) 
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on  the  number  v  past  points.  This  bound  was  motivated  by  computational  experience  that  showed  that 
using  more  than  about  n 1/3  interpolation  conditions  rarely  helped  much,  and  also  by  the  desire  to  keep  the 
■borage  and  arithmetic  costs  of  our  tensor  method  low.  In  fact,  however,  our  computational  results  will 
show  that  the  strong  linear  independence  criterion  discussed  above  usually  limits  p  far  more  severely 
than  (2.3). 

Now  we  discuss  how  we  choose  Tc  and  Vc  to  satisfy  (2.2).  First  we  show  that  the  interpolation 
conditions  (2.2)  uniquely  determine  Ts?  and  Vs*4  for  each  k  =  1,  •  •  ,p.  Multiplying  (2.2b)  by  sk  gives 

Vf(x.k)Tsk  =  ^/(xc)Tsk+s[V2f(xc)Sir+^-Tcski+~VcSki'  k= 1,  p  (2.4 1 

Let  the  unknown  quantities  a,  pe  Rp  be  defined  by 

a[k]  =  Tsk3,  (2.5a) 

(3[*]  =  VV.  (2.5b) 

for  k-\,  ■  ■  p .  Then  from  (2.2a)  and  (2.4)  we  have  the  following  systems  of  two  linear  equations  in  two 
unknowns  for  each  of  the  p  pairs  a[i  ]  and  p{/k  )  : 


]  =  <?!(*]  , 

(2.6a) 

]=?,[*]  . 

(2.6b) 

where  <71,  qpeRp  arc  defined  by 

<7i[*  1  =  V/  (x_*)r sk-Vf  (xc)T sk-s{V2f  (xc  ).vt 

qzik  !  =f(x.k)-f  (xcy-Vf  (xc)T  sk-\sJV2f  (xc  )sk , 

Am 

for  k=l,  ■  ,p,  The  system  (2.6)  is  nonsingular,  so  each  cx[>k  j  and  (3[£  ]  is  uniquely  determined. 

Thus  for  each  k,  our  interpolation  conditions  uniquely  dcierm ine  Vjt4  and  Tsk3,  and.  from  (2.2b), 
leave  us  with  np  linear  equations  in  0  ( n 4)  unknowns  to  determine  the  p  terms  Tsp+-jVsk3.  These  are  the 
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only  remaining  interpolation  conditions,  meaning  that  the  choices  of  T  and  V  are  vastly  underdetermined. 

We  have  chosen  to  select  T  and  V  from  among  the  infinite  number  of  possibilities  by  first  choosing 
the  smallest  symmetric  V ,  in  the  Frobenius  norm,  for  which 

VV  =  P(*1.  k=l,  p 

where  (3[£  ]  is  calculated  by  (2.6).  The  rationale  behind  this  choice  is  to  use  the  smallest  fourth  order  term 
consistent  with  the  interpolation  conditions,  thereby  modeling  as  much  of  the  function  and  gradient  infor¬ 
mation  as  possible  with  a  third  order  model.  This  choice  also  will  give  us  the  necessary  fourth  order 
information  in  the  singular  case.  We  then  substitute  this  value  of  V  into  (2.2b),  obtaining 

Tc  Sf  -  ak ,  k-l,  ■  ,p  (2.7a) 

where 

ak  =  2 (V/ (x~k  yvf  (xc y~V*f  (Xc )— g- VsJ),  k=],  ■  ,p  (2.7b) 

This  is  a  set  of  np<n4/3  linear  equations  in  n3  unknowns  Tc[i  ,jjc],  \<i,j,k<n.  Finally  we  choose  the 
smallest  symmetric  Tc,  in  the  Frobenius  norm,  which  satisfies  the  equations  (2.7).  The  use  of  the 
minimum  norm  solution  here  is  consistent  with  the  tensor  method  for  nonlinear  equations,  and  will  again 
be  a  key  to  the  efficiency  of  our  method  because  it  will  cause  Tc  and  Vc  to  have  low  rank. 

The  solutions  to  the  minimum  norm  problems  that  determine  Vc  and  Tc  are  given  by  Theorems  2.2 
and  2.3.  We  note  that  deriving  the  minimum  norm  Tc  is  much  more  difficult  than  in  the  tensor  method 
for  nonlinear  equations,  because  of  the  symmetry  requirement.  First  we  define  three  and  four  dimensional 
rank  one  tensors,  which  will  feature  prominently  in  these  theorems  and  the  remainder  of  this  paper.  Also 
recall  that  the  Frobenius  norm  of  any  matrix  or  tensor  A  ,  denoted  1  |  A  \  |  f ,  is  the  square  root  of  the  sum 
of  the  squares  of  all  the  elements  of  A  . 

Definition  2.1  Let  u.v.wpte/?" .  The  tensor  T €  R n x" *"  for  which  T[i  ,j  ,k  ]=u  [i  ]*v  \J  ]*w  [k], 
\<i  ,j  ,k  <n  is  called  a  third  order  rank  one  tensor  and  will  be  denoted  T =uvw .  The  tensor  Vt  Rn  ™  x" x" 
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for  which  V  [j  ,j  Jc ,/  ]=u  [i  ]* v  [j  ]*w  [k  ]'**  [/  ],  1  <i  J  ,k  ,1  <n  is  called  a  fourth  order  rank  one  tensor  and  will 
be  denoted  V  =uvwx . 


Theorem  2.2  Let  p<n,  let  sk&Rn  ,k=  1,  ...  ,p  with  {s*}  linearly  independent,  and  let  (3eftp.  Define 
MeRpxp  by  M [1  ,j ]=(s?Sj )4,  1  <i,j<p ,  and  define  ye Rp  by  y=A/~'p.  Then  the  solution  to 

minimize  1  I  Vc  ]  U  subject  to  V’c^4=B[/kl,  k- 1,  ■  jj  and  Vc  svmmetric  p  q\ 

Vte  «*»•»»  " 


is 


Vc=jtlYk(skskskSk)  ■ 


(2.9) 


Proof.  Let  VeR"4  be  defined  by  VT  =  (  Vc  [l,l,l,l]tVe  [1,1, U],  ■  •  • ,  Vc[\,\.\,n],Vc  [1,1, 2,1], 
,Vc[l,l,2,n],  •  •  • ,  Vz[n,n,n,n] ).  Also  let  seRpXA\  be  defined  so  that  row  k  of  s'  is  equal  to 
( (5i[l])4.(5*[l])3(5*[2]),(Si[l])3(Si[3])  ,  • ,  (i*[l])3(s*[n]),  ■  •  •  ,(s*[*])4  ),  i.e.  the  same  order  of  sub¬ 

scripts  as  in  V.  Then  (2.8)  is  equivalent  to 

minimize  |  |  V  |  1 2  subject  to  s*  V  =  (3  and  Vc  symmetric  , 

where  Vc  is  the  original  form  of  V.  Since  (j*  }  are  linearly  independent,  /  has  full  row  rank.  The  solu¬ 
tion  to 

minimize  |  I V  |  1 2  subject  to  s'  V  =  (3 

is  V  =s‘T (sY '  ■'  }.  By  straightforward  algebra,  sfT=M .  Thus  T=/ry,  which  by  reversing  the  transforma¬ 
tion  faun  V  back  to  Vc  is  equivalent  to  (2.9).  Since  Vc  is  symmetric,  it  is  the  solution  to  (2.8).  □ 

Theorem  2.3  Let  p<n ,  let  skeRH,  k=\,  -  ,p  with  {s*}  linearly  independent,  and  let  akeRn, 
k—l,  ■  ■  ■  p .  The  solution  to 
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minimize  |  \TC  |  If  subject  to  7cj,s,=a1.  i=l,  ,p  and  7C  symmetric  (2.10) 

7",  6  /?*** 

is 

Tc^f^bkSkSk+SkbkSk+SkSkbk  (2.11) 

where  Jt=l,  •  ■  ■  j) ,  and  (6* }  is  the  unique  set  of  vectors  for  which  (2.11)  satisfies  7C  j,- =a, , 

i=l,  ■  •  ,p. 


Proof.  First  we  show  that  the  constraint  set  in  (2. 10)  is  feasible.  Let  r,- e  R n ,  i =1 ,  ■  •  j> ,  obey 


*7*7  - 


1.  »=/ 

0,  i*j 


for  7=1,  •  •  ■  ,p .  Such  vectors  r,  are  obtainable  via  a  QR  factorization  of  the  matrix  whose  columns  are  the 


s , .  Then 


7 = fjr,-  r,<2,  +r,  a,  r,  +a,  /,  /,  ~2(a,rj, )(/,  /,  /,■ ) 

is  a  feasible  solution  to  (2.10). 

Dennis  and  Schnabel  [1979]  show  that  if  the  constraints  in  (2.10)  are  satisfiable,  then  the  set  of  ten¬ 
sors  T;eRnXAXn  generated  by  the  procedure  7 o=0,  and  for  all  7  =0,1, 2,  •  •  ■ ,  T27+1  is  the  solution  of 

minimize  |  \TiJ+\-T2j  I  If  subject  to  727+iJ,.r,=a,,  i=l,  ••  (2.12) 

and  72y+2  is  the  solution  of 

minimize  |  |  Tij+i-Tjj +1 1  If  subject  to  T^j^i  is  symmetric, 
has  a  limit  which  is  the  unique  solution  to  (2.10). 


Next  we  show  that  this  limit  has  the  form  (2.1 1)  for  some  set  of  vectors  [bk },  by  showing  that  each 


7 27  has  this  form.  This  is  trivially  true  for  7 q.  Assume  it  is  true  for  some  fixed  j ,  i.e. 
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T2j=^UkSkSk+SkUkSk+SkSkUk  (2.13) 

for  some  set  of  vectors  [uk  }.  Then  from  Schnabel  and  Frank  [1984],  the  solution  to  v2. 12)  is 

T 2;  +  l=T 2j  vksk  sk 

for  some  set  of  vectors  { v* } .  Thus 

T 2 j+2r=T 2j  +  -j  (J^  V*  Sk  Sk+Sk  V*  Sk  +Sk  Sk  Vk  ) 
=^(uk+-J-)SkSk+Sk(Uk+-^-)Sk+SkSk(uk+-y)  . 

which  again  has  the  form  (2.13).  Thus  by  induction  the  solution  Tc  to  (2.10)  must  have  the  form  (2.11) 
for  some  set  of  vectors  {bk } . 

Finally  we  show  that  the  set  of  vectors  ( bk  }  for  which  Tc  given  by  (2.11)  satisfies 

TcSiSi=ait  i=l,  ^?  (2.14) 

is  unique.  This  will  mean  that  the  equations  (2.1 1)  and  (2.14)  uniquely  determine  the  solution  to  (2.10). 
Substituting  (2.11)  into  (2.14)  gives  a  system  of  np  linear  equations  in  np  unknowns,  where  the  matrix  is 
a  function  of  the  {sk ),  the  unknowns  are  the  elements  of  the  \bk  },  and  the  right  hand  side  consists  of  the 
elements  of  the  [ak }.  Since  we  showed  above  that  (2.10)  is  feasible  for  any  {ak ),  the  above  derivation 
and  the  theory  of  Dennis  and  Schnabel  [1979]  imply  that  for  any  set  [5* },  this  linear  system  has  at  least 
one  solution  for  any  right  hand  side.  Thus  the  linear  system  must  be  nonsingular  and  have  a  unique  solu¬ 
tion.  This  means  that  the  set  of  vectors  { bk  }  is  uniquely  determined,  and  completes  the  proof.  □ 

Theorems  2.2  and  2.3  show  that  Tc  and  Vc  determined  by  the  minimum  norm  problems  (2.10)  and 
(2.8)  have  rank  2 p  and  p ,  respectively.  This  is  the  key  to  making  the  tensor  model  efficient  to  store  and 
solve.  However,  while  the  proof  of  Theorem  2.3  shows  constructively  that  there  is  a  unique  Tc  of  the 
form  (2.11)  that  satisfies  (2.10),  it  does  not  give  an  efficient  algorithm  for  finding  it,  since  the  proof 
involves  solving  a  system  of  np  linear  equations  in  np  unknowns.  We  now  present  an  efficient  method 


12 


for  finding  Tc . 

Substituting  (2.11)  into  (2.14)  gives  the  equations 

a.  =(^&*  sk  +Sk  bk  sk  +sk  sk  bk  )i,  Si 


=  'Lbk(sIsl )2+2^sk(sI st )(bTst )  , 


i=\,  ■  ■  ,p  in  the  unknowns  [bk }.  We  can  write  these  equations  in  the  matrix  form 

A  =B  N +2S  M 


(2.15) 


where  A eRH*p ,  with  column  k  of  A=ak,  BeRnxp ,  with  column  k  of  B=bk,  SeRnxp ,  with  column  k  of 
S=sk,  and  IV ,  M  eRpxp  with  Ntj={slsj)2  and  Mij=(sfsj)(bfsj),  l<i  ,j<p .  Note  that  B  contains  the  unk¬ 
nowns,  that  M  is  a  linear  function  of  these  unknowns,  and  that  A  ,  N ,  and  S  are  known.  Pre-multiplying 
both  sides  of  (2.15)  by  ST  gives 

[5Ti4  ]  =  [SrB]N  +  2[SrS  ]M  (2.16) 

Defining  Xij^bjsi ,  1  Si  ,j<p ,  we  can  rewrite  (2.16)  in  the  form  of  p2  linear  equations  in  the  p 2  unknowns 
xu 


(2.17) 


where  each  w;y  in  the  second  matrix  of  (2.17)  is  a  p  -vector  given  by  Wij={(sfs  i)(sjsj),  (sjsi){s\sj ),  ■  ■  ■ 

,  (sfsp)(s£sj)]T .  The  only  unknowns  in  (2.17)  are  the  xtJ ,  so  we  can  solve  (2.17)  for  Xy  ,  and  then  com¬ 
pute  M  by 

Mij  =  (s,7 Sj  ){bfsj )  =  (sJsj  )xiJ 


r 

- 

p 

- 

r  1 

sfal 

N 

*11 

Wll 

r 

-til 

s(a2 

X 12 

x\2 

. 

— 

+ 

Wl  p 

Spap 

Xpp 

wp  I 

wpp 

Xpp 

. 

■ 

. 

“  J 

• 

J 

L  J 

Finally,  from  (2.15),  we  can  compute  B  by 


13 


B  =(A  -2SM)N~'  .  (2.18) 

Note  that  N  is  symmetric  and  positive  definite  since  the  {s* }  are  linearly  independent. 

We  conclude  this  section  by  summarizing  the  costs  to  form  and  store  the  tensor  model.  The  dom¬ 
inant  costs  of  the  process  for  calculating  Tc  that  is  summarized  in  equations  (2.17)  and  (2.18)  are  np2 
each  multiplications  and  additions  for  calculating  STA ,  the  same  cost  for  calculating  S*M ,  the  same  cost 
again  for  the  backsolves  in  (2.18),  roughly  np2l 2  each  multiplications  and  addition^  ior  calculating  s?Sj 
for  \<i<j<p ,  and  p6l 3  each  multiplications  and  additions  for  solving  the  system  (2.17),  for  a  total  of 
(' V2)np 2  +  p6l 3  each  multiplications  and  addiuons.  Since  p  <n !/3,  these  are  all  O  (n2)  or  less.  The  addi- 
donal  costs  of  forming  Vc  are  negligible,  at  most  0(p3).  In  addition,  the  cost  of  forming  the  interpolation 
equations  (2.2)  includes  the  multiplication  V2/ ( xc)Sk  for  k-l,  •  ,p  which  requires  n2p  each  multipli¬ 
cations  and  additions.  This  is  generally  the  largest  cost  of  forming  the  tensor  model.  The  Gram-Schmidt 
process  for  selecting  the  {sk}  requires  about  n5ri  arithmetic  operations  if  nxri  vectors  are  considered.  In 
summary,  only  a  small  multiple  of  n2  additional  arithmetic  operations  are  required  to  form  the  model. 
We  will  see  in  section  5  that  usually  p= 1,  so  that  the  total  additional  cost  per  iteration  is  usually  n2  +  n5r3 
+  O  (n )  each  additions  and  multiplications  per  iteration. 

The  storage  required  for  forming  anc  storing  the  tensor  model  is  also  small.  The  tensor  terms  7C 
and  Vc  themselves  require  only  the  storage  of  the  vectors  bk  and  s*,  which  takes  2np  <  2nv3  storage 
locations.  In  addition,  the  model  formation  process  requires  at  most  2n4/3  storage  locations  for  storing 
nl/3  each  past  iterates  and  their  gradients,  np  <  n 4/3  storage  locations  for  intermediate  quantities  in 
(2.18),  and  p 4  <,  n4/ 3  storage  locations  for  the  factorization  in  solving  (2.17).  Thus  the  total  additional 
storage  is  at  most  6nAr}. 
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3.  Solving  the  Tensor  Mode! 

In  Section  2  we  showed  how  to  find  a  rank  2 p  tensor  Tc  of  the  form  (2. 1 1),  and  a  rank  p  tensor  Vc 
of  the  form  (2.9),  for  which  the  tensor  model  (2.1)  interpolates  the  function  and  gradient  values  at  p 
(<n 1/3)  past  iterates.  Substituting  these  values  of  Tc  and  Vc  into  (2.1),  the  tensor  model  has  the  form 

mr(xc+d)  =f(xe)  +  Vf  (x C)T d  +  ydrV2/ (xc )d  +  y  (bjd ) (sjd. )2  +  yk  {s[d  f  .  (3.1) 

In  this  section  we  show  how  to  efficiently  find  a  minimizer  of  this  model.  Although  equation  (3.1)  is  a 
fourth  order  polynomial  in  n  variables,  we  will  show  how  to  reduce  its  minimization  to  the  minimization 
of  a  fourth  order  polynomial  in  p  variables  plus  a  quadratic  in  n-p  variables.  For  conciseness  we  use  the 
notation  g  =V/  (xc )  and  //=V2/  (xc )  for  the  remainder  of  this  section. 

Let  Se/?"XP,  where  column  k  of  S  is  s*.  and  the  {s*)  are  linearly  independent.  Also  let 
and  WeR**?  have  full  column  rank  and  satisfy  ZTS- 0  and  WTS=I  respectively.  (Z  and 
W  can  be  calculated  through  the  QR  factorization  of  S ;  the  efficient  implementation  of  the  operations 
involving  Z  and  W  is  discussed  later  in  this  sectioa)  Then  we  can  write  d  in  (3.1)  in  the  form 

d  =  Wu+Zt  (3.2) 

where  ueRp ,  teRn~P .  Substituting  (3.2)  into  (3.1)  gives 

mT(.zc+Wu+Zt)=f(xc)  +  gTWu  +  gT  Zt  +  ^-uT  WT  HWu  (3.3) 

+  uTwTHZt  +  j-t? ZTHZt  +  |  ^  u*2 (b[Wu  +b{Zt)  +  -^^  yk  ukA  . 

Equation  (3.3)  is  a  quadratic  with  respect  to  t.  Therefore,  for  the  tensor  model  to  have  a  minimizer, 
ZT HZ  must  be  positive  definite  and  the  derivative  of  the  model  with  respect  to  t  must  be  0,  i.e. 

ZT g+ZTHZt+ZTWTHu+^ZT  fb.Ui^O  (3.4) 


which  yields 


IS 


fHZr  HZy{ZT  Cg  +HWu  +-J- fjb,  up). 


(3.5) 


Thus,  if  ZT HZ  is  positive  definite,  substituting  (3.5)  into  (3.3)  reduces  the  problem  of  minimizing  the 
tensor  model  to  finding  a  minimizer  of 


tf*r(M)=/  +gTWu+lrUTWT  HWu  +  Jr 


u,Wv*'«>+^ri| 


Y iUi 


(3.6) 


-j( g+HWu  +\%bi  up)TZ  (ZTHZ)~'ZT(g  +HWu  +-J- 1 'b:  up) 

which  is  a  fourth  degree  polynomial  in  p  variables.  If  (3.6)  has  a  minimizer  u.  ,  then  the  mirtimizer  of  the 
original  tensor  model  (3.1)  is  given  by  =  Wu«+Zf. ,  where  r.  is  determined  by  setting  u  =  u*  in  (3.5). 
Note  that  this  process  is  well  defined  even  if  H  is  singular,  as  long  as  ZT HZ  is  nonsingular  and  positive 
definite.  This  is  possible  if  rank (//)  >n-p. 

There  are  several  possible  difficulties  with  this  process.  First,  (3.6)  may  have  multiple  minimizers. 
If  p  ,  we  can  find  the  minimizers  analytically,  and  if  there  are  two,  we  choose  the  value  of  u.  that  is  in 
the  same  valley  of  the  function  mT(u)  as  u=0.  This  choice  can  be  shown  to  guarantee  that  there  is  a 
(nonlinear)  descent  path  from  xc  to  xc+d*  for  the  model  mj (xc  +d ).  If  p>  1  we  minimize  (3.6)  with  a 
standard  unconstrained  minimization  code  (starting  from  u= 0)  and  use  the  minimizer  it  returns.  We  have 
found  that  these  procedures  generally  produce  a  desirable  minimizer. 

Secondly,  the  tensor  model  may  not  have  a  minimizer,  either  because  ZT HZ  is  not  positive 
definite,  or  because  (3.6)  has  no  minimizer  when  ZT  HZ  is  positive  definite.  Finally,  even  if  (3.6)  has  a 
minimizer  d» ,  xc+d -  may  not  be  an  acceptable  next  iterate.  These  difficulties  are  addressed  by  using  a 
global  strategy. 

We  have  tried  both  line  search  and  trust  region  global  strategies  in  conjunction  with  our  tensor 
method.  The  line  search  strategy  we  used  is  simple  :  if  (3.6)  has  a  minimizer  d which  is  in  a  descent 
direction,  but  xc+d •  is  not  an  acceptable  next  iterate,  we  set  x+  =  xc+Xd-  for  some  Xe  (0,1]  using  a  stan¬ 
dard  line  search.  If  (3.6)  has  no  minimizer,  or  d .  is  not  in  a  descent  direction,  we  find  the  next  iterate  by 
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using  a  line  search  algorithm  based  on  the  standard  quadratic  model  (1.3).  The  tensor  method  based  on 
this  strategy  has  performed  quite  well  (see  Section  5),  but  we  find  that  about  40%  of  the  iterations  cannot 
use  the  tensor  model.  In  order  to  make  fuller  use  of  the  tensor  model,  we  have  also  tried  a  trust  region 
strategy,  which  is  the  method  that  we  concentrate  on  in  this  paper. 

The  trust  region  method  approximately  solves  the  problem 

minimize  mj  ( xc  +d)  subject  to  1  \d  |  1 2  ^  8  (3  7^ 

where  5e/?  is  the  trust  radius  that  is  adjusted  at  each  iteration.  This  is  a  standard  type  of  approach  for 
unconstrained  optimization,  see  for  example  Fletcher  [1980],  Dennis  and  Schnabel  [1983],  Efficient 
methods  exist  for  solving  the  trust  region  problem  with  quadratic  models  (see  e.g.  Mord  and  Sorensen 
[1983],  but  it  is  quite  difficult  to  extend  them  to  the  tensor  model.  For  this  reason,  in  order  to  test  the 
trust  region  tensor  method  approach  initially,  we  used  a  penalty  method  to  solve  (3.7).  This  means  that 
we  solve  (3.7)  by  solving  a  sequence  of  unconstrained  optimization  problems  of  the  form 

minimize  mr  (xc  +d)  +  o  (djdp  -82)2  (3  g) 

for  increasing  positive  values  of  the  scalar  o.  (The  details  of  selecting  o  are  given  in  Chow  [1989].)  As  in 
most  trust  region  algorithms,  we  only  solve  (3.7)  approximately;  in  our  implementation  we  stop  when  a 
solution  a)  to  (3.7)  satisfies  |  |d*(a)|  j  e  [0.955,  1.055].  This  means  that  a  does  not  grow  unbound¬ 
edly,  and  in  practice  a  small  number  of  problems  of  the  form  (3.8)  are  solved  per  iteration.  The  penalty 
approach  is  only  intended  for  initial  test  purposes,  because  it  increases  the  cost  of  each  iteration  consider¬ 
ably  due  to  the  cost  of  solving  (3.8),  although  it  does  not  increase  the  cost  in  function  and  derivative 
evaluations.  We  will  see  that  our  best  results  so  far  have  been  obtained  when  p  is  constrained  to  be  1  at 
each  iteration;  an  efficient  but  complicated  method  for  solving  (3.7)  in  this  case  is  given  in  Chow  [  1989]. 

Finally,  we  discuss  the  costs  of  solving  the  tensor  model.  The  main  additional  calculations  in 
finding  a  minimizer  of  the  tensor  model,  in  comparison  to  minimizing  a  standard  quadratic  model,  are  the 
calculations  involving  the  matrices  Z  and  W .  These  are  performed  by  calculating  the  decomposition  S  = 


Q  R,  where  QeR1'™  is  an  orthogonal  matrix  that  is  the  product  of p  Householder  transformations,  and 
ReRnxp  consists  of  an  upper  triangular  matrix /?]  in  its  firsts  rows,  and  is  0  otherwise.  ( Q  is  not  actu¬ 
ally  formed,  rather  the  p  n  -vectors  that  determine  the  p  Householder  transformations  are  stored,  see  e.g. 
Stewart  [1970].)  Also  let  ReR**?  consist  of  (/? x)~l  in  its  first  p  rows  and  0  otherwise.  Then  IV  =  Q  R , 
so  for  any  veRn  we  can  calculate  WTv  in  2 np  each  multiplications  and  additions  by  applying  the  p 
Householder  transformations  for  QT  followed  by  0(p2)  operations  to  apply  (/?i)'.  Similarly  Z  =  Q  I 
where  /  is  0  in  its  first  p  rows  and  the  identity  matrix  in  its  bottom  n-p  rows.  Thus  for  any  ve/?n  we 
can  calculate  ZTv  in  2 np  each  multiplications  and  additions  by  applying  QT  and  then  /.  Using  these 
techniques,  it  is  straightforward  to  verify  that  all  the  calculations  in  the  tensor  method  that  involve  Z  and 
W,  as  well  as  the  QR  decomposition  of  S ,  can  be  performed  in  4 n2p  +  0(np2)  each  multiplications  and 
additions  per  iteration;  the  leading  term  comes  from  calculating  HQ  and  then  QT HQ . 

The  other  costs  of  minimizing  the  tensor  model  are  (n-p)2/6  each  multiplications  and  additions  for 
the  factorization  of  ZT HZ,  and  the  cost  of  minimizing  the  fourth  order  polynomial  in  p  variables  (3.6), 
which  is  negligible  in  comparison  to  the  0  (n3)  cost,  especially  when  p  =  1.  Thus  the  total  cost  of  minim¬ 
izing  the  tensor  model  is  only  a  small  multiple  of  n2p  operations  more  than  the  n3/ 6  cost  of  finding  a 
minimizer  of  a  standard  quadratic  model.  Since  p<n 1/3  and  we  will  see  that  usually  p  =  1,  this  is  a  very 
small  additional  cost. 

At  many  iterations,  the  tensor  model  has  a  minimizer  which  is  accepted  as  the  next  iterate,  so  these 
are  the  only  costs  of  solving  the  tensor  model.  If  a  global  strategy  is  needed,  then  the  line  search 
described  above  can  be  implemented  with  about  the  same  cost  as  for  a  standard  quadratic  model,  since 
given  the  factorization  of  ZT HZ  wc  can  also  factor  H  using  only  0(n2p)  additional  operations.  In  the 
case  p= 1,  the  trust  region  strategy  can  also  be  implemented  as  efficiently  as  in  the  quadratic  case,  i.e. 
requiring  the  minimization  of  the  tensor  model  at  each  inner  iteration,  by  using  the  techniques  in  Chow 
[1989],  The  penalh'  approach  is  more  expensive  but  is  only  intended  for  test  purposes. 
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4.  The  Complete  Tensor  Method  Algorithm 

An  outline  of  the  complete  tensor  method  algorithm  that  we  used  in  our  computational  tests  is 
given  in  Algorithm  4.1.  The  remainder  of  this  section  comments  on  several  aspects  of  this  algorithm  that 
have  not  yet  been  discussed. 

Algorithm  4.1  --  An  iteration  of  the  tensor  method.  Given  xc ,  f(xc),  8C  : 

1.  Calculate  Vf  (xc)  and  decide  whether  to  stop.  If  not: 

2.  Calculate  V2/  (xe). 

3.  Select  p  past  points  to  use  in  the  tensor  model  from  among  the  n 1/3  most  recent  past  points. 

4.  Calculate  the  terms  Tc  and  Vc  in  the  tensor  model,  so  that  the  tensor  model  interpolates  /  (x)  and 

V/  (x )  at  all  the  points  selected  in  step  3. 

5.  Find  a  potential  acceptable  next  iterate  Xc  +dr  and  a  potential  new  trust  radius  5 r  using  the  tensor 

model  and  a  trust  region  strategy. 

6.  Find  a  potential  acceptable  next  iterate  xc+dn  and  a  potential  new  trust  radius  5W  using  the  qua¬ 

dratic  model  and  a  trust  region  strategy. 

7.  \lf(xc+dT)<=f(xc+dN) 

then  set  x+  =  xc  +dT  and  5+  =  5 j 

else  set  =  xc  +d^j  and  5+.  =  5,v . 

8.  Setx c=x+,f(xcy=f  (x+),  5C  =  8+,  go  to  step  1. 

The  most  important  feature  of  Algorithm  4.1  that  has  not  been  previously  discussed  is  that  at  each 
iteration,  we  calculate  a  potential  new  iterate  based  on  the  quadratic  model,  as  well  as  a  potential  new 
iterate  based  on  the  tensor  model.  This  means  we  perform  a  full  global  strategy  using  each  model,  result¬ 
ing  in  two  points  xc+dr  and  xc+dn  which  both  give  sufficient  decrease  in  f(x)  to  be  acceptable  as  the 
next  iterate.  Then  we  choose  the  one  with  the  lower  function  value  as  the  next  iterate.  Even  though  this 
strategy  increases  the  cost  of  each  iteration  by  at  least  one  function  evaluation  (since  it  is  necessary  to 
evaluate  f  (x)  at  both  xc+dj  and  xc+ds,  and  maybe  at  some  unsuccessful  trial  points,  in  the  global 
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strategies),  we  have  found  that  this  approach  substantially  improves  the  efficiency  of  our  tensor  method  as 
measured  in  function  and  derivative  evaluations,  as  well  as  in  iterations.  We  have  not  yet  found  a  wav  to 
achieve  the  same  efficiency  without  requiring  the  use  of  both  models  at  each  iteration. 

Finally  we  discuss  some  details  of  the  steps  of  Algorithm  4.1.  In  steps  1  and  2,  the  gradient  and 
Hessian  are  approximated  by  finite  differences  using  Algorithms  A5.6.3  and  A5.6.2  in  Dennis  and  Schna¬ 
bel  [1983],  respectively.  The  algorithm  stops  if  ||  V/(xc)  |  i  2  ^  10-5  or  j  \  dc  |  |2  2  1CT’0.  Step  3  was  dis¬ 
cussed  in  Section  2;  45  degrees  is  used  for  the  angle  9  mentioned  there.  The  procedures  for  calculating 
Tc  and  Vc  in  step  4  also  were  discussed  in  Section  2. 

In  step  5.  we  first  determine  whether  the  tensor  model  has  an  acceptable  minimizcr  within  the  trust 
region,  and  if  so,  we  select  this  point  as  the  solution  to  step  5.  Otherwise  we  solve  the  trust  region  prob¬ 
lem  (3.7)  by  a  penalty  method,  as  discussed  in  Section  3,  resulting  in  a  candidate  step  d.  Then  we  decide 
whether  to  accept  xc+d  as  the  solution  to  step  5,  update  the  trust  radius,  and  possibly  repeat  this  process 
until  an  acceptable  point  xc+dr  is  found.  In  step  6,  we  follow  the  exact  same  procedure  except  that  we 
only  use  the  first  three  terms  of  the  model.  The  procedure  for  determining  whether  the  candidate  step  is 
acceptable  in  these  trust  region  algorithms,  and  for  updating  the  trust  region,  is  identical  to  Algorithm 
A6.4.5  in  Dennis  and  Schnabel  [1983],  except  that  :  1)  every  occurrence  of  initslope  is  changed  to 
A fpred,  where  A fpred  is  the  difference  of  the  values  of  the  model  being  used  (tensor  or  quadratic)  at  the 
candidate  point  and  at  xc  \  2)  steps  (9c.  1-2)  of  Algorithm  A6.4.5  are  replaced  by  setting  A fpred  to  this 
same  value. 
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5.  Test  Results 

We  have  tested  the  tensor  algorithm  described  in  Section  4  on  a  variety  of  nonsingular  and  singular 
problems.  We  compared  it  to  an  algorithm  that  is  identical  except  that  the  third  and  fourth  order  terms  Tc 
and  Vc  are  always  zero.  That  is,  the  comparison  algorithm  is  a  finite  difference  Newton’s  method,  whose 
global  strategy  is  a  trust  region  problem  solved  by  a  penalty  method.  In  this  section  we  summarize  our 
test  results.  The  details  of  our  computational  results  are  provided  in  the  appendix.  All  our  computations 
were  performed  on  a  MIPS  computer,  using  double  precision  arithmetic. 

First  we  tested  our  algorithms  on  the  set  of  unconstrained  optimization  problems  in  Mord,  Garbow 
and  Hillstrom  [1981  ].  All  these  problems  except  the  Powell  singular  problem  have  V2/  (*. )  nonsingular. 
The  dimensions  of  the  problems  range  from  2  to  30. 

Then  we  created  singular  test  problems  by  modifying  the  nonsingular  test  problems  of  Mord,  Gar- 
bow  and  Hillstrom  [1981].  All  of  the  unconstrained  optimization  test  problems  in  that  paper  are  obtained 
by  taking  a  system  of  nonlinear  equations 

F(x)r  =  </,  (x  ),■•,  fm(x))  (5.1) 

where  m  >n  and  each /,  :Rn  —*R ,  and  setting 

f(x)  =  F(x)TF(x)  =  £f,2(x).  (5.2) 

In  most  cases,  F(x)  =  0  at  the  minimizer  x> ,  and  F  (x- )  is  nonsingular.  In  these  cases,  Schnabel  and 
Frank  [1984]  showed  how  to  create  singular  systems  of  nonlinear  equations  from  (5.1),  by  forming 

F( x)  =  F(x)  -  F  (x>  )A  (A7  A  )~lAr(x-x- )  ,  (5.3) 

where  AeRnx k  has  full  column  rank  with  1£1: <n .  Thus  F(x>)=0  and  F'( x- )  has  rank  n-k.  To  create  a 
singular  unconstrained  optimization  problem,  we  simply  define  the  function 

f(x)  =  'AF(x)TF(x).  (5.4) 

From  (5.4)  and  F(x.  )=0,  we  have  Vf  (x- )  =  F'(x-  )T F (x- )  =  0.  From 
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/'(*.)  =  F’(x.)[I-A(At  A)~'At]  (5.5 ) 

and 

V2/(x. )  =  F\x.  )tF’( x.  )  +  p, (. x .  )V2/,  (x. )  =  F'(x.  )T F'(x* )  (5.6) 

we  know  that  V2/ (x. )  has  rank  a  . 

By  using  (5.3-4),  we  created  two  sets  of  singular  problems,  with  V2/  (x. )  having  rank  n-1  and 
n-2,  by  using 

Ae/?"xl  ,  At=(1,1 . 1) 

and 

i  ,  rnx2  aT  —  1  1  1  1  1 

AeR  ,  A  -!_!!_]  ...  (_,y 

respectively.  We  tested  our  methods  on  the  singular  versions  of  all  the  nonsingular  problems  except  the 
Gaussian  function  and  the  Gulf  research  and  development  function,  which  we  excluded  because  their 
nonsingular  versions  never  converged  to  a  minimize."  using  either  standard  or  tensor  methods. 

Our  computational  results  for  the  sets  of  test  problems  whose  Hessians  at  the  minimizers  have 
ranks  n,  n- 1,  and  n-2  are  summarized  in  Tables  5. 1-5.3,  and  given  in  detail  in  Appendices  A1-A3, 
respectively.  For  each  problem  set.  Tables  5. 1-5.3  compare  the  performance  of  the  standard  method  to 
two  versions  of  the  tensor  method:  the  one  described  in  Sections  2-4  where  the  number  of  past  points 
interpolated,  p .  is  selected  at  each  iteration  to  be  between  1  and  n  |/3,  and  a  second  version  where  p  is  res¬ 
tricted  to  be  .  at  all  iterations.  We  tested  the  second  version  because  we  observed  that  the  first  version 
generally  chose  p  =  1  anyhow,  and  because  the  tensor  method  is  considerably  simpler  to  implement,  and 
is  cheaper  in  terms  of  storage  and  cost  per  iteration,  whenp  =  1. 

Tables  5. 1-5.3  summarize  the  comparative  costs  of  the  standard  and  tensor  methods  using  ratios  of 
two  measures:  iterations,  and  function  and  derivative  evaluations.  The  iteration  ratio  is  the  total  number 
of  iterations  required  by  the  tensor  method  on  all  problems  that  were  successfully  solved  by  both 
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methods,  divided  by  the  total  number  of  iterations  required  by  the  standard  method  on  these  problems. 
The  second  ratio  is  based  upon  the  total  number  of  function  evaluations  required  to  solve  each  problems, 
including  those  for  finite  difference  gradients  and  Hessians  (i.e.  we  count  n  function  evaluations  per  gra¬ 
dient  evaluation  and  (n2+3n)/2  function  evaluations  per  Hessian  evaluation).  The  ratio  reported  is  the 
total  of  these  numbers  for  the  tensor  method  over  all  problems  that  were  successfully  solved  by  both 
methods,  divided  by  the  total  of  these  numbers  for  the  standard  method  over  the  same  problems.  Tables 
5. 1-5.3  also  contain,  for  that  test  set,  the  number  of  problems  where  the  performance  of  the  tensor  method 
was  better,  worse,  or  the  same  as  the  standard  method.  Here  better  is  defined  as  at  least  5%  better  in  the 
function  evaluation  measure,  worse  is  defined  as  at  least  5%  worse  in  the  function  evaluation  measure, 
and  the  remaining  problems  are  considered  the  same. 

The  statistics  in  Tables  5. 1-5.3  only  pertain  to  test  problems  that  were  solved  successfully  by  both 
the  standard  method  and  the  tensor  method.  Table  5.4  shows,  for  each  test  set.  how  many  problems  were 
solved  successfully  by  the  tensor  method  but  not  by  the  standard  method,  and  vice  versa. 

In  summary.  Tables  5. 1-5.4  show  that  both  the  p>l  and  the  p= 1  versions  of  the  tensor  method  have 
a  large  advantage,  in  both  reliability  and  efficiency,  over  the  standard  method  on  all  three  test  sets.  In 
each  of  the  six  comparisons,  a  substantial  portion  of  the  test  problems  (between  16%  and  22%)  are  solved 
by  the  tensor  method  and  not  the  standard  method,  while  only  two  problems  in  the  nonsingular  sets  and 
none  in  the  singular  sets  are  solved  by  the  standard  method  and  not  the  tensor  method  In  addition,  on  the 
problems  solved  by  both  methods  (between  43  and  50  problems  in  each  of  the  six  cases),  the  average  cost 
of  the  tensor  method,  measured  in  iterations  or  function  and  derivative  evaluations,  is  generally  slightly 
less  than  half  of  the  cost  of  the  standard  method.  Finally,  the  improvements  by  the  tensor  method  are 
quite  consistent.  Totaling  all  our  tests,  the  tensor  method  is  worse  than  the  standard  method  in  8%  of  the 
test  cases  (28  of  352),  better  in  87.5%  (308  of  352),  and  the  same  in  4.5%  (16  of  352). 

The  performances  of  the  version  of  the  tensor  method  which  constrains  p  to  be  1  and  the  version 
that  allows  p  to  be  between  1  and  nv3  are  rather  similar  overall,  with  the  p  =  1  version  actually 
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Table  5.1  —  Summary  of  Test  Results  for  Nonsingular  Problems 


method 

tensor/std.(itn) 

tensor/std.(fcn) 

tensor  better 

std.  better 

tie 

P=  1 

0.496 

0.580 

36 

5 

3 

p>l 

0.468 

0.488  1 

34 

5 

MM 

Table  5 2  --  Summary  of  Test  Results  for  Rank  n-1  Singular  Problems 


method 

tensor/std.(itn) 

tensor/std.(fcn) 

tensor  better 

std.  better 

tie 

P- 1 

0.465 

0.400 

42 

3 

2 

p>l 

0.479 

0.466 

40 

5 

2 

Table  53  -  Summary  of  Test  Results  for  Rank  n-2  Singular  Problems 


method 

tensor/std.(itn) 

tensor/std.(fcn) 

tensor  better 

std.  better 

tie 

p=l 

0.449 

0.390 

45 

1 

3 

p>l 

0.489 

0.466 

43 

5 

Table  5.4  --  Number  of  Problems  Solved  by  Tensor/Standard  Method  Only 


method 

nonsingular 

singular(rank  n-1) 

singular* rank  n-2) 

p=l 

13/2 

9/0 

13/0 

p>l 

11/2 

12/0 

10/0 

performing  somewhat  better  overall  on  the  singular  test  sets  and  the  p  21  version  performing  somewhat 
better  on  the  nonsingular  test  set.  One  reason  for  their  similarity  is  that  even  when  we  allow  p  >1,  we 
have  found  that  our  criterion  for  selecting  past  iterates  to  interpolate  generally  results  in  p- 1.  Over  all 
our  test  problems,  we  found  that  the  p>  1  method  selected  p=l  85%  of  the  time,  p=  2  15%,  and  p> 2 
0.35%.  Thus  it  appears  that  the  advantages  of  the  tensor  method  may  be  achieved  by  using  p= 1,  which 
would  mean  that  the  extra  cost  of  storing  and  forming  the  tensor  model  would  be  very  low,  and  that  the 
method  would  be  quite  simple  to  implement.  In  particular,  using  p=l  has  the  advantage  that  the  formulas 
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for  Tc  and  Vc  are  readily  available  in  closed  form  in  terms  of  the  function  and  derivative  values  being 
interpolated,  and  that  solving  the  tensor  model  reduces  to  minimizing  a  fourth  order  polynomial  in  one 
variable  which  also  can  be  done  in  closed  form. 

In  our  tests,  the  global  portion  of  the  tensor  method  (steps  5-7  of  Algorithm  4.1)  selected  the  step 
from  the  quadratic  model  about  20%  of  the  time  on  the  average.  While  this  is  a  rather  small  percentage, 
the  performance  of  the  tensor  method  is  improved  significantly  by  allowing  this  possibility. 

We  do  not  claim  to  fully  understand  why  the  tensor  method  performs  so  much  more  efficiently  and 
reliably  than  a  state  of  the  art  standard  method  in  our  tests.  What  is  especially  surprising  is  that  the 
improvements  are  attained  by  incorporating  a  small  amount  of  extra  information,  usually  just  the  function 
in  gradient  from  the  previous  iterate,  into  the  model.  Apparently,  having  a  more  accurate  model  in  the 
direction  of  the  previous  step  is  especially  useful  in  practice. 

The  computational  advantage  of  the  tensor  method  is  probably  not  due  to  an  improved  rate  of  con¬ 
vergence,  except  when  rank(V2/  ( x • ))-  n- 1 .  In  particular,  when  V2/ (x. )  is  nonsingular  and  n  >  1 .  it  is 
highly  unlikely  that  the  convergence  rate  of  the  tensor  method  is  different  than  the  quadratic  rate  of  the 
standard  method.  (It  is  easy  to  show  that  the  tensor  method  is  at  least  quadratically  convergent  in  this 
case  because  the  influence  of  the  tensor  terms  vanishes  asymptotically.)  In  the  case  when  V2/  (x* )  has 
rank  n-l,  we  conjecture  that  the  convergence  rate  of  the  tensor  method  is  again  better  than  the  linear  con¬ 
vergence  of  the  standard  method,  as  was  shown  by  Frank  [  1 984]  for  the  tensor  method  for  nonlinear  equa¬ 
tions.  We  have  not  yet  attempted  to  prove  this,  except  in  the  case  n= 1  where  it  is  straightforward  to  show 

that  the  tensor  method  converges  with  order  =<  1.2  (Chow  [1989]).  We  did  measure  the  ratios  of 

the  errors  of  successive  iterates  on  our  test  problems  with  rank  V2/ (x» )  =  n- 1.  An  example  is  given  in 
Table  5.5.  We  see  that  the  standard  method  converges  linearly  with  constant  =  2/3,  as  predicted  by  the 
theory,  and  that  the  tensor  method  appears  to  be  converging  faster  than  linearly.  (An  interesting  feature 
of  this  example  is  that  iterations  2  and  5  of  the  tensor  method  increase  the  error  in  x,  even  though  the 
function  value  decreases.  We  noticed  such  behavior  by  the  tensor  method  on  several  test  problems. 


25 


although  for  most  it  did  not  occur.)  When  rank(V2/  (x. ))  =  n- 2,  the  tensor  method  does  not  have  enough 
information  to  prove  a  faster  than  linear  convergence  rate,  since  it  usually  uses  p  =1. 


Table  5.5  -  Speed  of  Convergence  on  a  Typical  Problem  with  Rank  V2/  (*. )  =  n-1 
(Singular  version  of  variably  dimensioned  function,  n  =  10,  started  from  x  0,  using  p=  1) 


Numbers  in  the  second  and  the  third  columns  are 


\Xk-X- 


Finally,  we  have  also  implemented  a  line  search  version  of  the  tensor  method  and  compared  it  to  an 
algorithm  using  the  standard  quadratic  model  and  the  same  line  search.  We  found  that,  on  the  average, 
the  performances  of  the  line  search  and  trust  region  versions  of  the  quadratic  model  algorithms  were  very 
similar,  and  that  the  line  search  version  of  the  tensor  method  was  almost  15%  less  efficient  than  the  trust 
region  tensor  method.  (See  Chow  [  1989]  for  details.)  We  observed  that  the  global  strategy  is  only  able  to 
use  a  tensor  method  step  about  60%  of  the  time  in  the  line  search  tensor  method  versus  about  80%  of  the 
time  in  ihe  trust  region  version.  This  may  be  related  to  the  difference  in  their  performances.  But  the  line 
search  tensor  method  still  improves  by  a  large  amount  over  the  standard  method. 

6.  Summary  and  Future  Research  Directions 

We  have  presented  a  tensor  method  for  unconstrained  optimization  that  bases  each  iteration  upon  a 
fourth  order  model  of  the  objective  function.  This  model  interpolates  the  function  value,  gradient,  and 
Hessian  of  the  objective  function  at  the  current  iterate,  and  forms  its  third  and  fourth  order  terms  by  inter¬ 
polating  values  of  the  function  and  gradient  at  previous  iterates.  The  costs  of  storing,  forming,  and  using 
the  model  are  not  significantly  more  than  for  a  standard  method  that  uses  a  quadratic  Taylor  series  model. 

The  computational  results  of  Section  5  show  that  the  tensor  method  is  substantially  more  reliable 
and  efficient  than  the  corresponding  standard  method  on  both  the  nonsingular  and  singular  problems  that 
we  tested.  This  experience  indicates  that  the  tensor  method  may  be  preferable  to  methods  available  in 
software  libraries  for  solving  small  to  medium  sized  unconstrained  optimization  problems,  in  cases  when 
analytic  or  finite  difference  Hessian  matrices  are  available.  Obviously,  more  computational  experience  is 
necessary  to  determine  this  conclusively.  To  facilitate  this  process,  we  are  developing  a  software  package 
that  implements  a  tensor  method  for  unconstrained  optimization  using  analytic  or  finite  difference  second 
derivatives,  and  will  make  it  available  shortly.  Our  software  package  restricts  p ,  the  number  of  past 
iterates  whose  function  and  gradient  values  are  interpolated  at  each  iteration,  to  be  one.  The  reasons  for 
this  choice  are  that  our  computational  results  show  that  the  tensor  method  with  p= 1  is  generally  about  as 
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effective  as  the  method  that  allows  p  >1,  and  that  the  method  is  considerably  simpler  and  cheaper  to 
implement  in  this  case.  Initially  it  will  use  a  line  search  railier  than  a  trust  region,  because  the  line  search 
tensor  method  is  currently  much  easier  to  understand,  and  much  faster  on  small,  inexpensive  problems, 
than  the  trust  region  version,  while  still  leading  to  large  savings  in  iterations  and  function  and  derivative 
evaluations  on  our  test  problems.  A  trust  region  version  may  be  added  to  the  package  later. 

Several  interesting  research  topics  remain  concerning  the  tensor  method  described  in  this  paper.  As 
indicated  above,  the  development  of  a  simple,  efficient  method  for  approximately  solving  the  trust  region 
problem  using  the  tensor  method  would  be  very  useful.  Chow  [1989]  has  developed  a  fairly  efficient,  but 
complex  method  for  solving  the  trust  region  problem  (3.7)  when  p=l;  the  question  of  how  to  solve  this 
problem  efficiently  when  p>  1  remains  open.  It  would  also  be  nice  to  develop  an  effective  global  strategy 
that  does  not  require  the  determination  of  the  step  using  both  the  tensor  model  and  the  quadratic  model  at 
each  iteration.  Finally,  as  we  mentioned  in  Section  5,  the  local  convergence  analysis  in  the  case  n  >  1 
remains  open. 

The  standard  and  tensor  methods  discussed  in  this  paper  both  assume  that  the  analytic  or  finite 
difference  Hessian  is  available  at  each  iteration.  Often  in  practical  applications,  however,  the  analytic 
Hessian  is  not  available,  and  it  is  expensive  to  calculate  by  finite  differences,  so  secant  (quasi-Newton) 
methods  are  used  instead.  These  methods  are  based  on  a  quadratic  model  that  is  formed  solely  from  func¬ 
tion  and  gradient  values  at  the  iterates  (see  e.g.  Dennis  and  Mord  [1977],  Fletcher  [1980],  Dennis  and 
Schnabel  [1983]).  We  are  developing  a  secant  tensor  method  for  unconstrained  optimization  that  bases 
each  iteration  upon  a  fourth  order  model  that  is  also  determined  solely  from  function  and  gradient  values 
at  the  iterates.  This  work  is  described  in  Chow  [1989]  and  in  a  forthcoming  paper. 
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Appendix  A 


Test  Results  for  the  Standard  and  Tensor  Methods 


The  columns  in  Tables  A.l  -  A.3  have  the  following  meanings  : 


Function:  name  of  the  problem. 
n :  dimension  of  the  problem. 

xo :  starting  point  (from  Mord,  Garbow  and  Hillstrom  [1981]).  1,  10  and  100  stand  for  x0.  10xo  and 
lOOxo,  respectively. 

itns:  number  of  iterations. 

fens:  number  of  function  evaluations  (including  the  necessary  function  evaluations  for  finite  difference 
gradients  and  Hessians). 

x* :  two  methods  converge  to  the  same  minimizer  if  and  only  if  they  have  the  same  letter  in  this  column. 

The  abbreviations  OF,  OL  and  NC  stand  for  overflow,  over  iteration  limit,  and  convergence  to  a 
nonminimizer,  respectively.  The  iteration  limit  was  120. 
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Table  A.l  --  Test  Results  for  the  Standard  and  Tensor  Methods 
on  Nonsingular  Problems 


Function 

n 

Xo 

Tensor  (p>\) 

Tensor  (p  =  l) 

Standard 

fens 

itns 

fens 

x. 

X* 

Rosenbrock 

2 

1 

15 

150 

a 

146 

a 

22 

183 

a 

10 

35 

345 

a 

345 

a 

64 

542 

a 

100 

86 

833 

a 

795 

a 

OL 

- 

- 

10 

i 

21 

1724 

a 

21 

1683 

a 

21 

1615 

a 

10 

OF 

- 

- 

OF 

- 

- 

73 

5605 

a 

100 

OF 

- 

- 

OL 

- 

- 

OL 

- 

- 

30 

1 

21 

11640 

a 

25 

13240 

a 

22 

11610 

a 

10 

OF 

- 

- 

83 

44067 

a 

92 

48484 

a 

100 

OF 

- 

- 

OL 

- 

- 

OL 

- 

- 

Wood 

4 

1 

30 

629 

a 

30 

629 

a 

62 

1230 

a 

10 

32 

674 

a 

34 

723 

a 

70 

1391 

a 

100 

OL 

- 

- 

OF 

- 

- 

OL 

- 

Helical 

3 

1 

11 

170 

a 

9 

138 

a 

13 

175 

a 

valley 

10 

14 

213. 

a 

14 

215 

a 

16 

216 

a 

100 

13 

194 

a 

13 

194 

a 

15 

203 

a 

Trigono¬ 

2 

1 

4 

40 

a 

4 

40 

a 

4 

37 

a 

metric 

10 

6 

62 

a 

6 

62 

a 

7 

59 

a 

100 

4 

38 

a 

4 

38 

a 

5 

43 

a 

10 

1 

6 

553 

a 

7 

555 

a 

19 

1467 

a 

10 

9 

708 

a 

9 

708 

a 

50 

3820 

a 

100 

40 

3106 

a 

43 

3336 

a 

32 

2446 

a 

Beale 

2 

1 

7 

71 

a 

7 

71 

a 

7 

62 

a 

10 

12 

120 

a 

8 

78 

a 

OL 

. 

- 

100 

OL 

- 

- 

OF 

- 

- 

NC 

- 

- 

Brown  and 

4 

1 

13 

272 

a 

13 

271 

a 

18 

347 

a 

Dennis 

10 

16 

333 

a 

14 

292 

a 

24 

461 

a 

100 

22 

469 

a 

21 

440 

a 

40 

778 

a 

Brown  badly 

2 

1 

OL 

_ 

_ 

OF 

_ 

OL 

_ 

_ 

scaled 

10 

OL 

- 

- 

OL 

- 

. 

OL 

- 

100 

OL 

- 

- 

OL 

- 

- 

OL 

- 

- 

Box  three 

3 

1 

12 

183 

a 

13 

200 

a 

21 

290 

a 

dimensional 

10 

19 

305 

a 

23 

359 

a 

59 

827 

b 

100 

20 

325 

a 

17 

281 

b 

OL 

* 

- 

31 


Table  A.l  (continued) 


Penalty  I 

4 

1 

10 

211 

a 

9 

190 

b 

33 

644 

c 

10 

10 

209 

a 

11 

229 

a 

39 

757 

a 

100 

14 

293 

a 

13 

272 

a 

43 

829 

a 

10 

1 

15 

1179 

a 

14 

1101 

a 

35 

2680 

a 

10 

11 

868 

a 

9 

707 

a 

40 

3059 

a 

100 

23 

1811 

a 

24 

1889 

a 

52 

3988 

a 

30 

1 

19 

10058 

a 

20 

10577 

a 

36 

18973 

a 

10 

22 

11640 

a 

26 

13761 

a 

44 

23189 

a 

100 

29 

15337 

a 

31 

16391 

a 

99 

52174 

a 

Penalty  II 

4 

1 

7 

151 

a 

5 

108 

a 

120 

2285 

b 

10 

13 

274 

a 

20 

416 

b 

OL 

- 

- 

100 

44 

913 

a 

29 

605 

b 

OL 

- 

- 

Variably 

4 

1 

7 

153 

a 

7 

153 

a 

10 

195 

a 

dimensioned 

10 

7 

150 

a 

7 

148 

a 

11 

214 

a 

100 

13 

272 

a 

13 

271 

a 

18 

348 

a 

10 

I 

10 

794 

a 

11 

863 

a 

16 

1229 

a 

10 

9 

936 

a 

10 

786 

a 

20 

1534 

a 

100 

18 

1492 

a 

17 

1338 

a 

32 

2447 

a 

30 

1 

18 

9540 

a 

10 

5306 

a 

33 

17391 

a 

10 

17 

10056 

a 

17 

9002 

a 

40 

21073 

a 

100 

56 

30641 

a 

OL 

- 

- 

112 

58956 

a 

Biggs 

6 

1 

27 

972 

a 

28 

1005 

a 

OL 

_ 

EXP6 

10 

83 

2987 

a 

34 

1223 

b 

OL 

- 

* 

100 

25 

914 

a 

26 

937 

b 

OL 

• 

- 

Chebyquad 

6 

1 

6 

221 

a 

6 

221 

a 

14 

484 

a 

10 

34 

1214 

a 

29 

1033 

b 

OL 

- 

- 

100 

50 

1785 

a 

53 

1885 

b 

OL 

- 

- 

20 

1 

14 

3815 

a 

16 

4064 

a 

OL 

_ 

10 

OF 

- 

- 

95 

24102 

a 

OL 

- 

- 

100 

OF 

- 

- 

104 

26385 

a 

NC 

- 

- 

Watson 

6 

1 

11 

403 

a 

11 

397 

a 

i9 

658 

a 

20 

1 

OF 

- 

- 

NC 

- 

- 

OL 

- 

- 

32 


Table  A.2  —  Test  Results  for  the  Standard  and  Tensor  Methods 
on  Singular  (rank  n-1)  Problems 


Function 

n 

*0 

Tensor  (p SI) 

Tensor  (p=l) 

Standard 

itns 

fens 

JC* 

itns 

fens 

x» 

■BM 

MM 

Rosenbrock 

2 

1 

31 

305 

a 

31 

305 

a 

70 

a 

10 

39 

383 

a 

39 

383 

a 

93 

764 

a 

100 

60 

601 

a 

58 

579 

a 

OL 

- 

- 

10 

1 

11 

945 

a 

15 

1181 

b 

15 

1162 

c 

10 

25 

2049 

a 

27 

2125 

b 

27 

2071 

c 

100 

29 

2272 

a 

41 

3224 

b 

66 

5060 

c 

30 

1 

6 

3202 

a 

6 

3202 

a 

60 

31598 

b 

10 

12 

6368 

a 

12 

6367 

b 

23 

12137 

b 

100 

21 

11647 

a 

21 

11118 

a 

35 

18453 

b 

Wood 

4 

1 

25 

519 

a 

25 

519 

a 

37 

708 

a 

10 

34 

713 

a 

36 

758 

a 

52 

1001 

a 

100 

77 

1576 

a 

58 

1200 

a 

OL 

- 

- 

Helical 

3 

1 

9 

147 

a 

9 

138 

a 

13 

180 

a 

valley 

10 

15 

229 

a 

15 

222 

a 

25 

334 

a 

100 

20 

319 

a 

16 

243 

a 

26 

345 

a 

Trigono- 

2 

] 

6 

59 

a 

6 

59 

a 

12 

102 

a 

metric 

10 

5 

50 

a 

5 

50 

a 

9 

75 

a 

100 

6 

58 

a 

6 

58 

a 

9 

76 

a 

10 

1 

8 

630 

a 

7 

553 

a 

14 

1085 

a 

10 

14 

1097 

a 

14 

1098 

a 

15 

1159 

a 

100 

20 

1639 

a 

18 

1399 

a 

NC 

- 

- 

Beale 

2 

1 

6 

64 

a 

7 

74 

a 

6 

52 

a 

10 

10 

100 

a 

10 

100 

a 

47 

396 

b 

100 

64 

679 

a 

23 

231 

a 

44 

364 

a 

Brown  and 

4 

1 

12 

254 

a 

13 

274 

a 

19 

366 

a 

Dennis 

10 

16 

337 

a 

14 

293 

a 

24 

461 

a 

100 

20 

421 

a 

21 

443 

a 

40 

778 

a 

Brown  badly 

2 

1 

OF 

_ 

OF 

. 

NC 

_ 

scaled 

10 

OL 

- 

- 

OF 

- 

- 

NC 

- 

- 

100 

OL 

- 

- 

OF 

- 

- 

NC 

- 

- 

Box  three 

3 

1 

9 

136 

a 

18 

270 

b 

9 

121 

c 

dimensional 

10 

11 

169 

a 

28 

423 

b 

83 

1083 

c 

100 

28 

430 

a 

25 

380 

a 

31 

412 

b 

33 


Table  A. 2  (continued) 


Penalty  I 

4 

1 

4 

84 

a 

4 

84 

a 

15 

290 

a 

10 

4 

84 

a 

4 

84 

a 

20 

385 

a 

100 

7 

147 

a 

7 

147 

a 

26 

499 

a 

10 

1 

4 

318 

a 

4 

318 

a 

18 

1381 

a 

10 

7 

550 

a 

7 

550 

a 

24 

1838 

a 

100 

17 

1341 

a 

17 

1341 

a 

35 

2685 

a 

30 

1 

7 

3722 

a 

10 

5300 

a 

22 

11605 

a 

10 

17 

9001 

a 

16 

8471 

a 

29 

15292 

a 

100 

20 

10584 

a 

18 

9532 

a 

86 

45334 

a 

Penalty  II 

4 

1 

6 

130 

a 

4 

88 

a 

57 

1098 

b 

10 

11 

229 

a 

11 

229 

a 

13 

252 

b 

100 

17 

354 

a 

15 

313 

a 

OL 

- 

- 

Variably 

4 

1 

4 

91 

a 

4 

87 

a 

17 

328 

a 

dimensioned 

10 

9 

192 

a 

11 

229 

a 

19 

336 

a 

100 

17 

359 

a 

18 

372 

a 

25 

480 

a 

10 

1 

11 

1020 

a 

16 

1254 

n 

24 

1837 

a 

10 

19 

1479 

a 

17 

1332 

a 

27 

2066 

a 

100 

20 

1712 

a 

21 

1641 

a 

40 

3055 

a 

30 

1 

48 

25382 

a 

23 

12170 

a 

41 

21599 

a 

10 

OF 

- 

36 

19039 

a 

OL 

- 

100 

OF 

- 

- 

OF 

* 

- 

OL 

- 

- 

Biggs 

6 

1 

83 

2962 

a 

OF 

_ 

OL 

. 

_ 

EXP6 

10 

74 

2694 

a 

63 

2246 

b 

OL 

• 

- 

100 

OF 

- 

- 

OF 

- 

OL 

' 

- 

Chebyquad 

6 

1 

9 

332 

a 

10 

365 

a 

92 

3135 

b 

10 

22 

789 

a 

19 

683 

b 

OL 

- 

- 

100 

35 

1245 

a 

22 

785 

b 

OL 

- 

- 

20 

1 

29 

7349 

a 

8 

2045 

b 

OL 

_ 

10 

26 

7349 

a 

OF 

- 

“ 

OL 

- 

- 

100 

50 

13903 

a 

57 

14441 

a 

OL 

- 

- 

Watson 

6 

1 

8 

295 

a 

8 

295 

a 

8 

297 

a 

20 

1 

24 

6099 

a 

28 

7100 

OL 

- 

34 


Table  AJ  --  Test  Results  for  the  Standard  and  Tensor  Methods 
or  Singular  (rank  n-2)  Problems 


Function 

n 

Tensor  (p>l) 

Tensor  (p  =  1) 

Standard 

KiH 

feus 

X* 

fens 

X • 

itns 

fens 

Rosenbrock 

2 

1 

7 

68 

a 

7 

71 

a 

16 

131 

10 

5 

52 

a 

5 

52 

a 

21 

171 

100 

11 

HI 

a 

13 

128 

b 

26 

211 

10 

1 

11 

872 

a 

11 

872 

a 

14 

1084 

10 

33 

2718 

a 

25 

1959 

b 

29 

2233 

100 

25 

2007 

a 

29 

2330 

a 

45 

3473 

30 

1 

11 

5841 

a 

11 

5841 

a 

21 

11083 

10 

23 

12172 

a 

25 

1959 

a 

OL 

- 

100 

OF 

- 

- 

59 

i 

31247 

a 

OL 

- 

Wood 

4 

1 

13 

277 

a 

13 

275 

a 

19 

366 

10 

16 

339 

a 

18 

378 

b 

24 

461 

100 

OF 

- 

- 

OF 

- 

- 

NC 

- 

Helical 

3 

1 

15 

222 

A 

13 

196 

a 

41 

541 

valley 

10 

21 

316 

19 

281 

a 

49 

648 

100 

22 

328 

a 

21 

315 

a 

47 

621 

Trigono- 

2 

1 

4 

38 

a 

4 

38 

a 

8 

67 

metric 

10 

6 

62 

a 

6 

62 

a 

10 

83 

100 

7 

70 

a 

7 

70 

a 

8 

67 

10 

1 

6 

553 

a 

7 

554 

a 

15 

1161 

10 

11 

941 

a 

11 

864 

a 

15 

1159 

100 

NC 

- 

~ 

NC 

- 

- 

NC 

- 

Beale 

2 

1 

6 

65 

a 

6 

65 

a 

10 

84 

10 

10 

101 

a 

9 

95 

b 

10 

84 

100 

22 

235 

a 

20 

217 

a 

NC 

- 

Brown  and 

4 

1 

12 

253 

a 

13 

271 

a 

18 

347 

Dennis 

10 

17 

355 

a 

14 

292 

a 

24 

461 

100 

20 

424 

a 

20 

419 

a 

40 

778 

Brown  badly 

2 

1 

NC 

_ 

NC 

_ 

NC 

scaled 

10 

NC 

- 

- 

NC 

- 

- 

NC 

- 

100 

NC 

- 

- 

NC 

1 

- 

NC 

- 

Box  three 

3 

1 

16 

248 

a 

11 

172 

b 

16 

200 

dimensional 

10 

22 

331 

a 

13 

202 

a 

22 

304 

100 

48 

i 

763 

a 

40 

633 

b 

46 

637 

x*  i 
~b 
b 
c 

b 
b 
a 

a 


b 

c 


a 

a 

a 

a 

a 

a 

a 

a 


b 

c 


a 

a 

a 


cr  cro 


35 


Table  A. 3  (continued) 


Penalty  1 

4 

1 

3 

64 

a 

3 

64 

a 

15 

290 

a 

10 

4 

84 

a 

4 

84 

a 

20 

385 

a 

100 

7 

147 

a 

7 

147 

a 

26 

499 

a 

10 

1 

4 

318 

a 

4 

318 

a 

18 

1381 

a 

10 

6 

437 

a 

7 

555 

a 

24 

1838 

a 

100 

19 

1469 

a 

18 

1418 

a 

35 

2685 

a 

30 

1 

9 

4776 

a 

8 

4246 

a 

22 

11605 

a 

10 

17 

9002 

a 

15 

7944 

a 

29 

15292 

a 

100 

16 

8473 

a 

18 

9525 

a 

86 

45334 

a 

Penalty  II 

4 

1 

4 

88 

a 

4 

88 

a 

5 

100 

a 

10 

11 

230 

a 

12 

253 

a 

14 

271 

b 

100 

15 

314 

a 

16 

335 

b 

20 

385 

c 

Variably 

4 

1 

9 

192 

a 

9 

192 

a 

13 

254 

b 

dimensioned 

10 

9 

193 

a 

9 

191 

a 

42 

804 

b 

100 

18 

377 

a 

12 

250 

b 

OL 

- 

10 

1 

21 

1663 

a 

11 

868 

b 

50 

3818 

c 

10 

16 

1217 

a 

14 

1100 

b 

102 

7769 

c 

100 
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